-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UR][Bindless] Initial implementation of bindless images for HIP #2496
base: main
Are you sure you want to change the base?
Conversation
d7d669e
to
a89b779
Compare
TODO: Complete the description with more information. Signed-off-by: Georgi Mirazchiyski <[email protected]> Co-authored-by: Peter Žužek <[email protected]>
a89b779
to
0912559
Compare
@oneapi-src/unified-runtime-hip-write , @oneapi-src/unified-runtime-bindless-images-write Heyo, that's a whole lot of changes and additions, so I am just pinging as a reminder in case it was lost in messages over the Winter holidays. There is no rush in reviewing immediately, I just wanted to remind about it. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I was able to give the most thorough review with respect to specific HIP APIs or all of the ins and outs of the image properties and logic concerning rows and pitches and so on.
return static_cast<ur_sampler_addressing_mode_t>((Props >> 2) & 0b111); | ||
} | ||
|
||
ur_sampler_addressing_mode_t getAddressingModeDim(size_t i) const noexcept { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert i < 3
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye! Sure.
@@ -88,6 +95,13 @@ struct ur_device_handle_t_ { | |||
int getConcurrentManagedAccess() const noexcept { | |||
return ConcurrentManagedAccess; | |||
}; | |||
|
|||
bool supportsHardwareImages() const noexcept { | |||
return HardwareImageSupport ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a little clunky. I see that hipDeviceAttributeImageSupport
returns ‘1’ if Device supports image, ‘0’ otherwise.
(source).
Perhaps we could store HardwareImageSupport
as a boolean, and format the return value of hipDeviceGetAttribute
once?
// Check if texture functions are supported in the HIP host runtime.
int ret;
UR_CHECK_ERROR(hipDeviceGetAttribute(
&ret, hipDeviceAttributeImageSupport, HIPDevice));
detail::ur::assertion(ret == 0 || ret == 1);
/*(bool)*/ HardwareImageSupport = ret == 1;
? Or just return HardwareImageSupport == 1
here I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's another one (attribute member) stored as integer but used as boolean too here. Anyways, since this one is going to be used as boolean in any situation I can imagine, it may make more sense to just store as boolean. I'll do that. Thanks.
} | ||
|
||
// Used for bookkeeping for mipmapped array leaks in mapping external memory. | ||
std::map<hipArray_t, hipMipmappedArray_t> ChildHipArrayFromMipmapMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the order important? Would a std::unordered_map
bring any benefits (it's usually faster)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair question. I don't know the answer to this, as it is a change I was asked to adapt from the Cuda backend implementation that fixes a race/leak. I will double check and come back to it. Hence, I didn't change the data structure. I will double check with the team to see if the order is important and if not I'll switch to std::unordered_map
.
return ReturnValue(ur_bool_t{false}); | ||
} | ||
case UR_DEVICE_INFO_BINDLESS_SAMPLED_IMAGE_FETCH_1D_USM_EXP: { | ||
// HIP does support fetching 1D USM sampled image data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a personal preference but I prefer supports
(as in the comment(s) above) to does support
(below). I initially parsed it as does not support
.
} | ||
case UR_DEVICE_INFO_BINDLESS_SAMPLED_IMAGE_FETCH_1D_EXP: { | ||
// HIP does not support fetching 1D non-USM sampled image data. | ||
// TODO: DPC++ doesn't implement the required builtins for SYCL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to have a TODO here if HIP doesn't support it anyway?
|
||
if (memType == hipMemoryTypeArray) { | ||
// HIP doesn not provide async copies between host and image arrays | ||
// memory in versions early than 6.2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// memory in versions early than 6.2. | |
// memory in versions earlier than 6.2. |
|
||
if (memType == hipMemoryTypeArray) { | ||
// HIP doesn not provide async copies between image arrays and host | ||
// memory in versions early than 6.2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// memory in versions early than 6.2. | |
// memory in versions earlier than 6.2. |
cpy_desc.dstY = pCopyRegion->dstOffset.y; | ||
cpy_desc.dstMemoryType = hipMemoryTypeHost; | ||
cpy_desc.dstHost = pDst; | ||
if (pSrcImageDesc->rowPitch == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay that cpy_desc.srcPitch
is uninitialized in this path? I might have missed similar instances of this question before/after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think here we want it (srcPitch
) zero-initialised. I thought it would be init to that, originally, from the looking at the hip_Memcpy2D cpy_desc = {};
construction (another one taken from the Cuda adapter counterpart). That was on the idea that, assuming that's a POD type, it's all zero-init. However, if my understanding is wrong it may be better to do it explictly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cuda CU_MEMORYTYPE_ARRAY
, which is equivalent to hipMemoryTypeArray
, arrays are opaque memory and do not have (at least exposed) notion of pitch.
So it is not set as it is not expected for the pitch field to be used at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, after double checking this looks correct. Actually, thank you for clarifying this @DBDuncan!
I think Fraser's concern was also a little more general here as well - that we might end up with a garbage value if left unitialised (which isn't good practice) - ideally we want it zero-init in such cases. Luckily, since C++11 with the brace-init syntax via the uniform init-brackets {}
that are used for cpy_desc
here (considering hip_Memcpy2D
is a POD or trivially default-constructible), the members that are not explicitly initialised should get zero-init (or nullptr
for pointers).
extMemDesc.type = hipExternalMemoryHandleTypeOpaqueWin32; | ||
break; | ||
case UR_EXP_EXTERNAL_MEM_TYPE_WIN32_NT_DX12_RESOURCE: | ||
// Memory descriptor flag values such as hipExternalMemoryDedicatedare |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Memory descriptor flag values such as hipExternalMemoryDedicatedare | |
// Memory descriptor flag values such as hipExternalMemoryDedicated are |
case UR_EXP_EXTERNAL_SEMAPHORE_TYPE_WIN32_NT_DX12_FENCE: | ||
extSemDesc.type = hipExternalSemaphoreHandleTypeD3D12Fence; | ||
break; | ||
case UR_EXP_EXTERNAL_SEMAPHORE_TYPE_OPAQUE_FD: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this case falling through to default
would be better served with a [[fallthrough]]
annotation. Or left out? There's at least one more case of this further up if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll prefer a [[fallthrough]]
annotation.
This PR implements the entry points in the HIP adapter for the Bindless Images extension.
Some features, while implemented, only partially pass the End-to-end tests for them in DPC++ (the PR that enables the support and testing in DPC++: intel/llvm#16439). This is because the HIP runtime doesn't support, for example, some parameter combinations with some image memory types. The not-fully supported features are marked with
TODO
s in the device info queries for further work.Signed-off-by: Georgi Mirazchiyski [email protected]
Co-authored-by: Peter Žužek [email protected]